Delta Lake
16 items using Delta Lake
Projects
Blog Posts
Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg
Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.
Databricks Series, Part 6: ML Serving and Workflows
Batch and real-time model inference, Databricks Model Serving endpoints, and orchestrating the full ML pipeline with Databricks Workflows.
Databricks Series, Part 5: Machine Learning with MLflow
Tracking experiments, logging models and artifacts, comparing runs, and managing the model lifecycle with MLflow on Databricks.
Databricks Series, Part 4: Feature Engineering at Scale
Databricks Feature Store, FeatureEngineeringClient, FeatureLookup, training sets, and eliminating training-serving skew.
Databricks Series, Part 3: Data Ingestion with Auto Loader
cloudFiles format, schema inference, schema evolution, and building robust incremental ingestion pipelines on Databricks.
Databricks Series, Part 2: Lakehouse Architecture
Unity Catalog for governance and discovery, the medallion Bronze/Silver/Gold pattern, and Delta tables as the storage foundation.
Databricks Series, Part 1: Getting Started
Navigating the Databricks workspace, launching clusters, writing notebooks, and submitting your first PySpark job.
Databricks Series, Part 0: Overview
The lakehouse platform concept, what Databricks adds on top of Spark and Delta Lake, and how it compares to alternatives.
Delta Lake Series, Part 6: Streaming & CDC
Writing to Delta with Structured Streaming, exactly-once guarantees, reading Delta as a stream, and Change Data Feed for downstream propagation.
Delta Lake Series, Part 5: Performance Optimization
Making Delta Lake queries fast — OPTIMIZE, Z-ordering, data skipping with column statistics, compaction, and partitioning strategies.
Delta Lake Series, Part 4: Time Travel & Versioning
Querying historical snapshots by version or timestamp, rolling back bad writes, auditing the table history, and managing retention with VACUUM.
Delta Lake Series, Part 3: Schema Enforcement & Evolution
How Delta Lake validates schemas on write, rejects incompatible data, and handles controlled schema changes over time.
Delta Lake Series, Part 2: Transaction Log & ACID
How the Delta Lake transaction log enables atomicity, serializable isolation, optimistic concurrency, and conflict resolution.
Delta Lake Series, Part 1: Getting Started
Creating Delta tables, reading and writing with Spark, Delta SQL, and what the _delta_log looks like in practice.
Delta Lake Series, Part 0: Overview
The data lake reliability problem, what Delta Lake adds on top of Parquet, and how it compares to Apache Iceberg and Apache Hudi.